in silico Plants — Latest Matching Preprints

1

Kinetic model of a determinate legume root nodule reveals plant metabolic characteristics for more efficient nitrogen fixation symbiosis

Ji, R.; Kaste, J. A. M.; Matthews, M. L.

2026-05-01 plant biology 10.64898/2026.04.28.721409 medRxiv

Top 0.1%

45.5%

Show abstract

While nitrogen fertilizers are widely used in agricultural production, their application incurs significant environmental and energetic costs. In contrast, some crops are less dependent on these fertilizers because they engage in symbioses with rhizobia, nitrogen-fixing bacteria provide ammonium to the plant in exchange for carbon. However, the carbon cost associated with nitrogen fixation can negatively impact crop yields. Improving the efficiency of this metabolic process could alleviate this impact on crop productivity. Mathematical models can help us quantitatively explore metabolic behavior and identify potential targets for metabolic engineering. In this work, we developed a kinetic model of determinate root nodule metabolism, where this symbiotic exchange of carbon from the plant and nitrogen from the bacteria occurs. We used this model to evaluate how the predicted metabolic behavior differs between inefficient and efficient nodules, and to identify potential engineering targets for improving nitrogen fixation efficiency and rate. We show that the enzymes phosphoenolpyruvate carboxylase and pyruvate kinase have significant influence on the predicted rate and efficiency of nitrogen fixation, especially when their expression is varied in combination with oxidative Pentose Phosphate Pathway enzymes like glucose-6-phosphate dehydrogenase and 6-phosphogluconolactonase. The model predicts that pairing a 3-fold decrease in glucose-6-phosphate dehydrogenase activity along with either a 3-fold increase in phosphoenolpyruvate carboxylase activity or decrease in pyruvate kinase activity could increase nitrogen fixation rate by 5.51% while improving nitrogen fixation efficiency by 7.74%.

2

A guaranteed-convergence algorithm for coupled leaf photosynthesis–transpiration–stomatal conductance models

Masutomi, Y.;Kobayashi, K.

2026-07-08 Plant Biology 10.64898/2026.06.24.734164 medRxiv

Top 0.1%

13.1%

Show abstract

The photosynthesis-transpiration-stomatal conductance (An-E-gs) model framework is widely used for estimating photosynthesis, transpiration, and stomatal conductance in plants. The model equations are solved by numerical iteration, and the converged model values are deemed the solution. However, there has been no general guarantee that the iterative procedure converges to a solution or that the procedure leads to convergence. Building on the recent proof of the existence of a unique set of solutions, we herewith propose a numerical algorithm that is guaranteed to converge to the solution for the An-E-gs model framework. We first analytically prove that the proposed algorithm necessarily converges to a solution. We then demonstrate the convergence across contrasting combinations of leaf temperature, relative humidity, light, atmospheric CO2, and wind speed. We further demonstrate rapid convergence with the algorithm: no more than ca. 10 iterations for approximately 10-3 mol CO2 m-2 s-1 precision in net photosynthesis and no more than ca. 20 iterations for 10-7 mol CO2 m-2 s-1 precision. By guaranteeing convergence to the solution, this algorithm eliminates concerns about nonconvergence in leaf gas-exchange calculations and is expected to serve as a robust foundation for a range of studies from leaf-level gas exchange to global-scale carbon and water cycle dynamics.

3

Modeling of Glucosinolate Biosynthesis During Biotic Stress as a Function of mRNA

Earle, J.; Neefjes, A. C. M.; Ploeger, X. S. D.; van Laar, M.; Van Wees, S. C. M.; Schuurink, R. C.; van Dijk, A. D. J.; Bleeker, P.; Hoefsloot, H.

2026-05-30 systems biology 10.64898/2026.05.29.728632 medRxiv

Top 0.1%

12.2%

Show abstract

Glucosinolates are an important group of specialized metabolites in the Brassicaceae family, playing a role as defensive compounds against biotic attackers. In response to biotic stress, plants upregulate glucosinolate biosynthesis in part by increasing the abundance of enzymes in the glucosinolate biosynthetic pathway. As an increase in enzyme abundance is often preceded by an increase in the corresponding mRNA levels, the dynamic changes in mRNA levels should capture the information required to infer how metabolite levels change over time. In order to test this hypothesis, a time series of experimental glucosinolate content data collected from Arabidopsis thaliana, exposed to either a mock or methyl jasmonate (MeJA) treatment, as a proxy for biotic stress, was combined with existing mRNA abundance data over time at the same developmental stage and treatment. We propose the GEEM model, a multilevel mechanistic ordinary differential equation (ODE) model, which goes from Gene expression to an enzyme level model, followed by a Michaelis Menten kinetics metabolite model, to simulate the dynamics of a segment of the indolic glucosinolate pathway. In order to constrain the GEEM model, three models were fit to experimental de novo specialized metabolite data, using different degrees of freedom by utilizing both a Gradient Boosted Tree model with a tested architecture to predict the kinetic constants, and augmenting these predictions with a literature review of the known Michaelis Menten kinetic constants from the glucosinolate pathway. Using Sequential Monte Carlo - Approximate Bayesian Computing to fit the GEEM model to the experimental data, we showed that given the mRNA levels and initial concentrations of metabolites, the changes in specialized metabolites over time and treatment can be modeled. Author SummaryWe study how plants adjust their natural chemical defenses over time when they are under attack from living organisms. In the mustard family, including the subject of our experiment Arabidopsis, one important group of defense chemicals is called glucosinolates. When Arabidopsis is under attack, certain gene pathways can be activated or deactivated, allowing the plant to modulate the amount of enzymes they produce, which in turn modulates the levels of these defensive chemicals. In this work, we combine measurements of gene activity and glucosinolate levels from Arabidopsis treated with a compound used in stress signal that mimics insect or pathogen attack. We then constructed a mathematical model that goes from gene activity, to amount of enzyme present, and ends with the amounts of specific glucosinolates over time. By fitting this model to experimental data, we show that it is possible to predict how glucosinolate levels change over time from the gene activity and initial glucosinolate levels. Our approach offers a way to connect gene expression datasets to real changes in plant defense chemistry, with potential applications in plant breeding and insight into how these pathways change due to stress.

4

Improved model representation of the photosynthetic light reactions reduces estimates of global gross primary productivity

Lamour, J.; Chave, J.; Johnson, J.; Berry, J.; Davidson, K. J.; Ely, K. S.; Fang, L.; Koven, C. D.; Needham, J. F.; Niinemets, U.; Perez, R. P. A.; Schmiege, S. C.; Zhihong, S.; Way, D. A.; Rogers, A.

2026-05-12 plant biology 10.64898/2026.05.08.723728 medRxiv

Top 0.1%

9.9%

Show abstract

The assimilation of carbon dioxide by plants can be predicted by the Farquhar, von Caemmerer and Berry model of photosynthesis. This largely mechanistic model is central to understanding how plants influence Earths climate. However, it represents the use of light by photosynthesis using an empirical formulation. Johnson and Berry proposed an alternative mechanistic formulation based on the functioning of the cytochrome b6f complex that includes key steps in light harvesting and electron transport. We compared both formulations using photosynthetic light response measurements from 146 C3 species spanning arctic to tropical biomes and implemented them in the terrestrial biosphere model ELM-FATES to simulate global photosynthesis. The Johnson and Berry formulation better fitted the measured response of leaf-level photosynthesis to light, and predicted lower photosynthetic rates at intermediate light levels, which decreased global estimations of terrestrial photosynthesis by 8%. Our findings support adopting the Johnson and Berry formulation to improve model representation of global carbon cycle modeling.

5

Knowledge-guided Bayesian optimization using pre-trained LLMs speeds up the identification of superior genotypes from germplasm collection

Hamazaki, K.; Tsuda, K.

2026-07-02 bioinformatics 10.64898/2026.06.28.735149 medRxiv

Top 0.1%

9.0%

Show abstract

Background: Germplasm collections contain wide genetic diversity that is valuable for plant breeding, but conducting phenotypic evaluation for all genotypes in field trials is rarely feasible. Bayesian optimization offers a way to decide, season by season, which genotypes to cultivate in order to identify superior genotypes with fewer evaluations. However, standard Bayesian optimization commonly starts from randomly selected genotypes and mainly relies on surrogate models built from marker genotype information, while the text-based passport information that accompanies germplasm is not fully used. We examined whether pre-trained large language models can provide prior knowledge that improves these decisions in germplasm evaluation. Results: We constructed a large-language-model-guided Bayesian optimization framework that introduces large language models into two parts of the Bayesian optimization workflow. In zero-shot warmstarting, a large language model proposes initial genotypes using passport information such as cultivar name, country of origin, and subpopulation, optionally together with principal component scores derived from genome-wide single-nucleotide-polymorphism markers. In addition, we evaluated a large-language-model-based surrogate model that predicts phenotypic values for untested genotypes using in-context learning from previously evaluated genotypes. Using a rice germplasm panel and two target traits (seed number per panicle for maximization and protein content for minimization), we compared strategies. For seed number per panicle, zero-shot warmstarting with a general-purpose instruction-following model reduced the number of evaluated genotypes needed to reach the best genotype, whereas improvements were small for protein content. When genomic information was available, Gaussian-process-based Bayesian optimization was the strongest overall approach, while the large-language-model-based surrogate model outperformed random baselines and was competitive in some settings. When genomic information was not available, predictions based on passport information improved efficiency compared with fully random strategies. Conclusions: Pre-trained large language models can inject useful agronomic knowledge into Bayesian optimization for germplasm evaluation, particularly by improving early-stage genotype selection, and can also support optimization when genomic information is unavailable. As models better handle long genomic sequences together with passport information, large-language-model-guided Bayesian optimization may become a practical and explainable decision-support approach for agricultural optimization.

6

Evaluating crop models for future climate scenarios: wheat yield predictions using APSIM and STICS under combined CO2, warming, and water deficit conditions

Severini, A. D.; Gawinowski, M.; Bancal, M.-O.; Launay, M.; Deswarte, J.-C.; Chenu, K.

2026-06-10 plant biology 10.64898/2026.06.07.730737 medRxiv

Top 0.1%

7.8%

Show abstract

Crop models are essential for predicting climate change impacts on agriculture, yet their validation under multi-stress conditions remains limited. This study evaluated two widely-used wheat models, APSIM and STICS, using data from three Free-Air CO2 Enrichment (FACE) experiments (USA, Germany, Australia) combining elevated CO2 (eCO2), water deficit, and warming. Environmental characterisation using simulation-based stress indices revealed that intended "controls" frequently experienced hidden heat and water stress, meaning models were calibrated on crops already undergoing physiological adjustments. Evaluation of simulated yield and components revealed a clear hierarchy in prediction errors (RRMSE): unlimited conditions (3-9%) < single stress (4-27%, with a need to improve response to heat stress) < combined stress (17-123%). Elevated CO2 generally increased prediction uncertainty for crops experiencing water stress. Our results suggest that current stress functions from the models fail to capture the synergistic coupling between drought and heat stress. This highlights the urgent need for more mechanistic modelling to improve the reliability of climate change impact assessments.

7

Multispecies Mixtures: An Individual-Centered Quantitative Genetic Framework for Complex Plant Neighborhoods

Salas, N.; Montazeaud, G.; Bourke, P. M.; Baranger, A.; David, J.

2026-05-29 genetics 10.64898/2026.05.27.728303 medRxiv

Top 0.1%

6.4%

Show abstract

Modern agriculture faces major sustainability challenges, including stagnating yields, dependence on fossil resources, and severe environmental impacts. Increasing intra- and interspecific diversity within plots through agroecological design is a promising method for enhancing crop productivity and stability. However, mixed-crop performance remains highly variable, and the genetic architecture of interactions within heterogeneous canopies is poorly understood. Two quantitative genetic frameworks have been proposed: trait-based models, which describe how interacting traits shape phenotypes, and variance-based models, which treat neighbor genotype effects as "black-box" social effects. However, existing variance-based models have been developed almost exclusively for intraspecific interactions and simple neighborhoods. We propose a general multispecies framework describing how a focal plants phenotype and total breeding value arise from its own direct effects and from the indirect effects of conspecific and heterospecific neighbors. We derived analytical expressions for phenotypic variance, inter-individual covariance, total breeding value variance, and relative heritable variance, which explicitly account for spatial structure, relatedness, and environmental similarities. Using a two-species alternating-row field layout and extensive simulations based on flexible variance-covariance structures, we evaluated the statistical power and bias of joint mixed-model estimators of direct and indirect genetic and environmental effects under a wide range of parameter combinations. Our results show that accurate separation of direct and indirect effects depends on trait heritability and replication, and that modeling genetic covariances across effects and species substantially improves estimation accuracy. This framework provides a unified, individual-centered basis for analyzing complex multispecies neighborhoods and quantifying the breeding potential of plant communities. Article SummaryGrowing several crop species or varieties together in the same field can boost yield and stability, but the outcome is unpredictable and the genetic causes remain unclear. We developed a theoritical & statistical framework that links each plants performance to its own genes and to those of its neighbors, both from the same and from a different species. Computer simulations of a two-species field showed that these direct and neighbor-driven genetic effects can be reliably separated when enough plants are measured per variety. The framework opens the way to breeding crop mixtures that perform well specifically when grown alongside another species.

8

A genetic network coordinated by TCP16 and LHY integrates regulation of the vegetative-reproductive phase transition in Arabidopsis thaliana

Motienoparvar, P.; Ebrahimi, A.; Kavousi, K.; Javaran, M. J.; Spillane, C.; McKeown, P.

2026-05-29 genetics 10.64898/2026.05.26.727858 medRxiv

Top 0.1%

5.3%

Show abstract

The transition to flowering in Arabidopsis thaliana is a complex process governed by many biological and environmental stimuli. Although many of the genes which regulate this process have been identified over the past 30 years, it remains unclear how these networks are integrated. In this study, we used the transcriptional responses of Col-0, Ler-1, and three mutant lines, to build a genome wide regulatory network of Arabidopsis thaliana during the flowering transition. The expression profiles of 22,810 genes across five genotypes were collected from the GEO database Series GSE57 from which we assigned flowering-time genes to different interacting modules by an adapted form of Hierarchical Complete Linkage Clustering (HCLC) after reconstruction of regulatory networks according to the Position Weight Matrix (PWM)-based method. Within these modules, we identified 77 core genes and 31 controller or driver genes. We identify two genes, LHY and, less expectedly, the transcription factor TCP16, to be topographically positioned at the regulatory hubs a nine-gene transcriptional control unit, implying they have the capacity to integrate information from across the flowering time pathways which interpret different environmental or endogenous cues during the vegetative-reproductive transition. Interrogating their behaviour across transcriptional datasets, we show that both LHY and TCP16 show transcriptional oscillations during the flowering transition, with a wavelength that varies depending on environmental conditions. We suggest that the transcriptional responses of LHY and TCP16 allow them to regulate the flow of information through the genetic networks which integrates different floral transition cues, and that genetic modelling approaches can provide new insights into the regulation of well-studied biological processes such as the flowering transition. Author summaryHow plants decide when to flower is a critical stage for completing their life cycles. It is also of key agricultural importance, as crops need to flower at the right time of year to allow efficient pollination and harvesting. Many genes are known to affect flowering time control in plants. Here, we use computational approaches to estimate how different genes interact in flowering time control in Arabidopsis, a small plant in the mustard family which is widely used for molecular studies. We use large-scale studies of how gene expression changes in different plant lines which have disrupted or adjusted flowering time to group the many genes involved in flowering into different interacting pathway, which we visualise as sets of coloured nodes controlling one another in a network. We show that two genes may have new rols in integrating information from different pathways, and discuss how their behaviour might help them to function as intregrators of biological information - including the daily oscaillations in their expression.

9

Data-informed modelling captures metabolic reprogramming and reveals branch points mediating cold stress response and growth trade-offs in rice

Soltani, F.; Moreira Machado, T.; Weder, J.-N.; Camborda de la Cruz, S.; Peleke, F. F.; Szymanski, J. J.; Töpfer, N.

2026-07-07 plant biology 10.64898/2026.07.07.736767 medRxiv

Top 0.1%

5.3%

Show abstract

Understanding stress-induced metabolic reprogramming in crop plants can inform breeding strategies and support the development of stress-resilient varieties. Genome-scale metabolic modelling has shown promise in elucidating network-level responses to changing environments, yet as an optimality-based approach it relies on the definition of an objective function, which is far from trivial for non-optimal conditions. To address this uncertainty, we used a time-resolved, data-informed metabolic model of rice (Oryza sativa L.) cold stress response as a test case, and explored two complementary approaches. We used sampling of the solution space combined with machine learning to identify reactions and pathways best characterizing the stress-induced metabolic shift, and used this information to perform Pareto analysis, placing growth and a stress-related objective in competition. This trade-off analysis identified key branch points in carbohydrate, amino acid, phenylpropanoid, nucleotide, and fatty acid biosynthesis, where resource reallocation towards stress-protection comes at the expense of growth. It further revealed differential flux modes across subcellular compartments and shifts in reducing equivalent provision as distinguishing features of the stress response. Together, these results provide a mechanistic understanding of the metabolic trade-offs and branch points governing cold stress response, and identify potential targets to optimize the cold response-growth trade-off in rice.

10

Efficient Optimization of Genotype Pairs for Intercropping using Genomic Prediction and Bayesian Optimization

Kinoshita, S.; Iwata, H.

2026-05-18 genomics 10.64898/2026.05.15.725387 medRxiv

Top 0.1%

4.9%

Show abstract

Intercropping is a promising strategy to improve productivity and sustainability in agricultural systems, but designing effective genotype combinations remains a major challenge owing to the rapid increase in possible pairings as the number of candidate genotypes increases. This creates a practical bottleneck because field evaluation of all combinations is infeasible under realistic resource constraints. Here, we propose a framework that integrates genomic prediction and Bayesian optimization to support efficient decision-making for intercropping system design. Using genome-wide marker data from sorghum and soybean, we simulated intercropping performance across 5,214 genotype pairs under certain genetic architectures, including variation in heritability, correlations between direct and indirect genetic effects, and the contribution of pair-specific interactions. Genomic prediction models incorporating direct and indirect genetic effects substantially improved prediction accuracy compared with models based on direct genetic effects alone, and inclusion of specific mixing ability further enhanced the performance under high-heritability conditions. When coupled with Bayesian optimization, the models rapidly identified superior genotype pairs, requiring fewer evaluation cycles than random or prediction-only search strategies. Acquisition functions that account for predicted uncertainty were most effective in complex scenarios involving interaction effects or negative correlations between direct and indirect effects. These results demonstrate that combining genomic prediction with Bayesian optimization can substantially reduce the experimental burden associated with intercropping design, while improving the efficiency of identifying high-performing genotype pairs. The proposed framework provides a practical approach for prioritizing candidate mixtures in breeding and field evaluation, and contributes to the development of data-driven strategies for sustainable agricultural systems. HighlightsO_LIA data-driven framework was developed to optimize genotype pairs in intercropping. C_LIO_LIModeling indirect effects improved prediction accuracy across genotype pairs. C_LIO_LIPair-specific interactions enhanced prediction under high-heritability conditions. C_LIO_LIBayesian optimization identified superior pairs under limited evaluation capacity. C_LIO_LIThe framework reduces field-testing requirements for intercropping system design. C_LI

11

A Bayesian approach for identifying similar transcript dynamics using curve registration

Kristianingsih, R.; Calderwood, A.; Sidhu, G.; Woodhouse, S.; Woolfenden, H. C.; Kurup, S.; Wells, R.; Morris, R. J.

2026-04-29 bioinformatics 10.64898/2026.04.26.720911 medRxiv

Top 0.1%

4.3%

Show abstract

Changes in gene expression over time can provide valuable insights into developmental processes and responses to the environment. Differences in expression may be indicative of potential differences in regulation. Comparing transcript dynamics may help identify correspondences between developmental stages within and between species, differences in the timing of key events during development, and transcriptional response to treatments or perturbations. A straightforward comparison between the dynamics is, however, hindered by measurements that were taken at different time points and over different timescales. To address this, we developed a statistical approach that seeks the optimal alignment between two time series as a function of a temporal shift and stretch. We validated our approach using simulated data and applied it to several transcriptome datasets, including comparisons between different plant species. Our development facilitates knowledge transfer from model systems to less studied species, the identification of modules of co-regulated genes, and the discovery of condition-specific, temporally differentially-expressed genes. The method is provided freely available as an R package.

12

A novel matrix multiplication framework for modeling genotype-by-environment interaction in genomic prediction

Montesinos-Lopez, O. A.; Montesinos-Lopez, A.; Montesinos-Lopez, J. C.; Crossa, J.; Dreisigacker, S.; Hernandez-Suarez, C. M.; Ortiz, R.

2026-05-15 genetics 10.64898/2026.05.11.724414 medRxiv

Top 0.1%

2.8%

Show abstract

Accurate modeling of genotype-by-environment (GxE) interaction is critical for genomic prediction in plant breeding but remains challenging due to complex interaction structures. Conventional models often use the Hadamard product of genotype and environment covariance matrices to capture joint similarity, which may not fully represent GxE complexity. Here we propose a novel framework that derives covariance structures from the matrix multiplication of genotype and environment kernels, decomposing these into symmetric components incorporated as random effects in mixed models. Evaluated for 11 wheat and rice multi-environment datasets and across, this approach consistently outperformed the traditional Hadamard-based model, improving prediction accuracy by up to 13.2% in Pearsons correlation and enhancing top-selection accuracy. Combining both methods yielded the highest performance, indicating complementary information capture. This framework offers a flexible, interpretable, and computationally feasible extension for modeling GxE interaction, potentially enhancing genomic selection effectiveness under diverse environmental conditions.

13

simSOMA: a cell-lineage based simulator of the somatic VAF spectrum in plants

Johannes, F.

2026-07-01 genomics 10.64898/2026.06.28.735079 medRxiv

Top 0.1%

2.4%

Show abstract

Plants accumulate somatic mutations during growth, and some of these mutations can spread from local cell lineages into branches, organs, or reproductive tissues. There is growing interest in these variants because they can underlie bud-sport traits in crops, contribute to within-organism somatic selection, and provide genetic variation that may be transmitted vegetatively or sexually to future generations. Recent genomic sequencing of bulk and layer-enriched plant tissues has shown that de novo somatic variants can generate complex variant allele-frequency (VAF) spectra. Interpreting these spectra requires understanding how mutations arising during mitotic cell division are filtered or amplified through shoot growth, branching, and organ formation. Because these processes interact across multiple scales, their combined effects are difficult to derive analytically. Here, we present simSOMA, a modular simulator that links rooted plant topologies to explicit cell-lineage dynamics. simSOMA models somatic mutation accumulation during stem-cell self-renewal in the shoot apical meristem, clonal expansion from the stem-cell niche to the meristem periphery, branch founding, and organ formation. Applying simSOMA across diverse growth scenarios revealed how individual processes can be isolated, varied, and combined to assess their effects on organ-level VAF spectra and among-organ variant sharing. The same simulated spectra can also be transformed to represent bulk or layer-enriched sampling and phased or unphased variant readouts, separating effects of developmental history from those introduced by tissue composition and allele counting. Because simSOMA is organized around modules with defined input-output interfaces, individual developmental components can be replaced or extended as new empirical information becomes available. This makes simSOMA a flexible tool for testing alternative models of somatic mosaicism in plants and for guiding the design and interpretation of VAF-based sequencing studies. The simulator is available at https://github.com/jlab-code/simSOMA.

14

An axiomatic approach to cultivar ranking in multi-environment trials

Kondratev, A. Y.; Ianovski, E.; Voronina, E.; Crossa, J.

2026-07-01 genetics 10.64898/2026.06.27.734959 medRxiv

Top 0.1%

2.1%

Show abstract

Multi-environment trials are central to cultivar evaluation because they reveal how candidate cultivars perform across locations, years, management conditions, and stress environments. The resulting yield matrix is a rich source of data on genotype-by-environment interaction, and a wide literature on estimation, decomposition, visualisation, and prediction of yield potential and stability has flourished. However the ultimate question of which cultivar to recommend on the basis of such a matrix is often left implicit. The question is far from trivial, and in this paper we formulate cultivar recommendation as an axiomatic ranking problem. This framework is rich enough to encompass the existing literature on stability indices, as well as any other deterministic ranking procedure. We show that many commonly used stability-based procedures can violate minimal criteria of efficiency or consistency. The result of such violations is that a cultivar with uniformly high yield could be ranked below a cultivar with uniformly low yield, or the relative ranks of two cultivars could depend on whether or not a third cultivar is present in the matrix. Our results prove that under a small number of such criteria the space of admissible rules collapses to the family of power means and their limiting cases. If we further wish to allow multiplication normalisation of yield, we are left with the geometric mean as the unique solution.

15

Growth under constraints: root tip development controls trade-offs between speed and mechanical efficiency

Dupuy, L. X.; Yao, J.; de las Heras Martinez, G.

2026-05-14 plant biology 10.64898/2026.05.14.724970 medRxiv

Top 0.1%

2.1%

Show abstract

Growth kinematics and soil mechanics are key to explain how roots overcome the mechanical resistance of soil, yet few studies are linking these two factors. Formulas for cone penetration tests are typically used to infer the friction experienced by roots, but these fail to consider how growth affects the external forces applied on the root. This study formalised how expansive growth in the root apical meristem can reduce soil friction, and applied the framework to analyse the growth strategy of 6 plant species. The results of the analysis revealed trade-offs between reducing frictions, maintaining a desired growth trajectory and elongation rate. A shorter elongation zone can reduce the fraction of the mechanical energy lost to friction, but this is done at the expense of the elongation rate. A sharper tip or increased radius can help roots maintain the elongation rate at no energetic cost, but these strategies come with the cost of growth instability (tortuous roots) and decrease in specific root length respectively. During establishment, root strategies may therefore occupy a 2-dimensional trait space in which the mechanical efficiency of growth is balanced against the explorative-exploitative trade-off. HighlightsGrowth and form of root tips explain how plants overcome mechanical resistance from the soil Trade-offs link the energy lost by friction, growth stability and elongation rate of roots Larger roots allow faster growth independently of these trade-offs New framework formalises plants strategies to acquire soil resources

16

Application of Computer Vision Tools to Maize Genomic Data for Trait Prediction and Gene Discovery

Higgins, S. A.; Anible, E.; Muthupari, M.; Dibble, C.; Murdoch, R. W.

2026-05-26 bioinformatics 10.64898/2026.05.21.726890 medRxiv

Top 0.1%

1.9%

Show abstract

Artificial intelligence and machine learning for computer vision (CV) and image recognition is a rapidly evolving field with multiple potential applications in plant genomics. While CV has been widely adopted by the research community for plant phenotyping and disease surveillance, applications of CV tools to plant genome analysis are underrepresented. CV tools may complement traditional statistical classification tools used in plant genomics, since CV perceives problems holistically rather than granularly (in terms of pattern recognition), which is particularly applicable to analysis of large, complex eukaryotic genomes. In this study, we report on a new strategy to apply existing CV tools to classify plant genotypes and predict genotype-phenotype relationships. A technique was developed for converting maize genome resequencing data into a set of images reminiscent of a quick response (QR) code. Several hundred maize genomes were processed and it was demonstrated that CV models can successfully categorize genome images into heterotic groups (accuracy and recall > 0.8). Models for classifying genome images into phenotypic trait groups (such as short, medium, and high plant height) performed with moderate success for the most heritable trait analyzed (ear height; accuracy and recall > 0.5). Querying model results permitted identification of genome regions that were important for model classification predictions. The CV model results revealed enriched metabolic pathways consistent with traits under consideration. Overall, our initial application of CV tools to plant genome analysis highlights its applicability to genomic data. Design of new CV architectures optimized for genome-derived images may further improve upon our initial results generated using only off-the-shelf CV tools optimized for unrelated image analysis tasks. Core ideasO_LIAI/ML computer vision (CV) tools were applied to encoded maize genomes C_LIO_LICV image classification tools were able to successfully classify encoded genomes into heterotic groups C_LIO_LITrait values of maize strain ear height could be predicted with moderate success C_LIO_LIGenome regions encoding plausible metabolic pathways used by the classifier were identified C_LIO_LIRecommendations for improved success of CV for genotype-to-phenotype are discussed C_LI

17

Physics-Informed Neural Networks for Parameter Recovery in the Repressilator Oscillatory Model

Casajuana, B.; Casals-Franch, R.; Lopez Garcia de Lomana, A.; Marti-Puig, P.; Villa-Freixa, J.

2026-05-15 bioinformatics 10.64898/2026.05.12.724679 medRxiv

Top 0.1%

1.8%

Show abstract

Parameter estimation in nonlinear biological dynamical systems is a difficult inverse problem because the governing equations are often stiff or oscillatory, the data are sparse and noisy, and the objective landscape is non-convex. Physics-informed neural networks (PINNs) offer an alternative to purely simulation-based calibration by representing state trajectories with neural networks while penalizing violations of the governing equations. This paper studies the empirical reliability of PINNs for recovering the parameters of the repressilator, a synthetic genetic oscillator formed by three cyclically repressive genes. We use synthetic time-series generated from the standard ordinary differential equation model and train inverse PINNs to estimate the production parameter {beta} and the Hill coefficient n. The study varies observation noise, partial observation of repressors, sampling density, sensitivity to initial parameter guesses, and the difference between stable and oscillatory regimes. The results show that PINNs can reconstruct trajectories accurately when the model structure is correct and the three repressors are observed, but parameter recovery is more fragile than trajectory fitting. Noise, sparse sampling, unobserved variables, and unfavorable initial guesses increase the risk of biased estimates. The stable regime is easier to reconstruct, whereas the oscillatory regime provides richer information but also exposes optimization sensitivity. These findings support PINNs as a useful reverse-engineering tool for small gene-regulatory ODE models, while highlighting the need for repeated runs, uncertainty reporting, and experimental designs that improve identifiability.

18

Identifying water stress response haplotypes in barley using latent environmental covariates

Aldiss, Z.; Brunner, S.; Heidariask, B.; Chenu, K.; Van Haeften, S.; Baraibar, S.; Ganesgalingam, D.; Moody, D.; Hickey, L.; Lam, Y.

2026-05-07 plant biology 10.64898/2026.05.04.722807 medRxiv

Top 0.1%

1.7%

Show abstract

PurposeGenotype-by-environment (G x E) interactions represent a major obstacle to increasing genetic gain in crop breeding, with the underlying physiological drivers often remaining obscured within conventional statistical models. This case study presents a novel framework that transforms the latent factors from Factor Analytic (FA) multi-environment trial (MET) models into heritable quantitative traits, enabling the genetic dissection of adaptive response patterns. MethodsA Factor Analytical Linear Mixed Model (FA-LMM) was fit to plot-level yield data for 1,036 barley genotypes across eight Australian trials. ResultsCorrelation of the factor loadings with APSIM-simulated environmental covariates demonstrated that the second latent factor FA2 was strongly correlated with the Water Stress Index (r = -0.83) during the critical flowering period, establishing water availability as the main biological axis of crossover Gx E. Genotypic scores for the derived traits, Overall Performance (OP) and Water Stress Response (WSR), were subjected to high-resolution haplotype-based mapping using local Genomic Estimated Breeding Values (GEBV). ConclusionThis analysis successfully identified major genomic regions that accounted for a substantial proportion of the additive genetic variance. Gene Ontology enrichment of candidate genes within the top haploblocks implicated fundamental pathways related to energy homeostasis, root development, and stress response, with notable candidates including FTsH11, BPS1, and TDP1. The distribution of favourable Haplotypes of Interest (HOI) in elite cultivars suggested a historical signature of inadvertent selection for these adaptive mechanisms. This framework provides an explicit bridge between statistical modelling and functional genomics, offering breeders actionable genetic targets for accelerated development of climate-resilient cereals.

19

Novel linkage disequilibrium-based genotype-by-environmental interaction method for genomic prediction of cotton yield and fibre quality traits

Li, Z.; Li, X.; Liu, S.; Wilson, I.; Zhu, Q.-H.; Stiller, W.; Conaty, W.

2026-05-06 plant biology 10.64898/2026.05.03.722538 medRxiv

Top 0.2%

1.5%

Show abstract

Genomic prediction (GP) across diverse environments has a potential to accelerate genetic gain in cotton breeding programs. A major challenge in GP is modelling genotype-by-environment interactions (GEI), which is essential for selecting stable and high-performing genotypes under variable production conditions. However, incorporating GEI into GP models increases the dimensionality and computational complexity, risking complex models that are impractical to use on commercial breeding-scale data sets because of run times and computational demands. This study addresses two primary aims. Firstly, we evaluate the practical benefits of GEI-informed GP for predicting economically important cotton traits. Second, advanced statistical modelling strategies are developed and assessed for integrating genomic and environmental data at scale. We propose a dimensionality reduction approach that combines linkage disequilibrium network analysis with principal component techniques to reduce redundancy while preserving informative variation. Using this reduced dataset, we implement Bayesian linear regression models and, for comparison, deep residual neural networks for genomic prediction. Analyses were conducted on a large multi-environment dataset from the CSIRO cotton breeding program, comprising 3,236 breeding lines, 54 environmental covariates, and 8,049 yield and fibre quality phenotype records collected over 10 years and 9 locations representing 41 year-location combinations. Results demonstrate that generally Bayesian linear regression approaches outperform BG-BLUP models, with all three linear/linear mixed methods providing clearly more reliable performance than the deep learning models. These findings highlight the value of using interpretable statistical models for integrating genomic and environmental information to support selection decisions under diverse environmental conditions.

20

Research on Intelligent Optimization of Farm Planting Strategies Driven by Crop Simulation Models: A Case Study of Farm X

Lyu, X.; Yu, R.; Zhu, R.

2026-04-29 plant biology 10.64898/2026.04.27.720996 medRxiv

Top 0.2%

1.5%

Show abstract

To meet the growing demand for precision and intelligent agricultural management, crop simulation models offer substantial potential for optimizing farm planting strategies. By simulating crop growth processes and assessing the effects of different management practices, these models provide a scientific basis for planting decision-making. In this study, the DSSAT model was first used to optimize the planting strategies of Farm X in 2023. Based on the optimized plans, the model was further applied to predict crop yields per unit area for 2024 and to establish the relationships among yield, planting density, and fertilizer application rate. Subsequently, SPSS was employed to develop a regression model describing the relationship among net profit per unit area, planting density, and fertilizer application rate. A genetic algorithm was then used to identify the optimal solutions under different scenarios, generating prescription maps for the optimal planting density and fertilizer application rate for each plot of Farm X in 2024. The results provide a scientific reference for the mechanized and automated implementation of field management practices and support the dual optimization of economic returns and resource use efficiency. This study not only conducted a systematic optimization of Farm X planting strategies for 2023, but also provided detailed predictions and optimized prescriptions for 2024 in a visual and practical form. The proposed approach offers a scientific decision-support tool for farm planting strategy formulation and lays a foundation for the intelligent and automated development of modern agriculture.